Sampling with Minimum Sum of Squared Similarities for Nystrom-Based Large Scale Spectral Clustering

نویسندگان

  • Djallel Bouneffouf
  • Inanç Birol
چکیده

The Nyström sampling provides an efficient approach for large scale clustering problems, by generating a low-rank matrix approximation. However, existing sampling methods are limited by their accuracies and computing times. This paper proposes a scalable Nyström-based clustering algorithm with a new sampling procedure, Minimum Sum of Squared Similarities (MSSS). Here we provide a theoretical analysis of the upper error bound of our algorithm, and demonstrate its performance in comparison to the leading spectral clustering methods that use Nyström sampling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nyström Sampling Depends on the Eigenspec- Trum Shape of the Data

Spectral clustering has shown a superior performance in analyzing the cluster structure. However, its computational complexity limits its application in analyzing large-scale data. To address this problem, many low-rank matrix approximating algorithms are proposed, including the Nyström method – an approach with proven approximate error bounds. There are several algorithms that provide recipes ...

متن کامل

An Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering

Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.

متن کامل

Improvement of the Classification of Hyperspectral images by Applying a Novel Method for Estimating Reference Reflectance Spectra

Hyperspectral image containing high spectral information has a large number of narrow spectral bands over a continuous spectral range. This allows the identification and recognition of materials and objects based on the comparison of the spectral reflectance of each of them in different wavelengths. Hence, hyperspectral image in the generation of land cover maps can be very efficient. In the hy...

متن کامل

Fast Algorithms for Unsupervised Learning in Large Data Sets

The ability to mine and extract useful information automatically, from large datasets, is a common concern for organizations (having large datasets), over the last few decades. Over the internet, data is vastly increasing gradually and consequently the capacity to collect and store very large data is significantly increasing. Existing clustering algorithms are not always efficient and accurate ...

متن کامل

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015